RTC: Fix RichTextData deserialization#76607
Conversation
…getPostChangesFromCRDTDoc()
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: +207 B (0%) Total Size: 8.76 MB
ℹ️ View Unchanged
|
| isRichTextAttribute( blockName, key ) && | ||
| typeof value === 'string' | ||
| ) { | ||
| newAttributes[ key ] = RichTextData.fromHTMLString( value ); |
There was a problem hiding this comment.
Following this call stack, we eventually land here:
Is this cheap enough (given the Document re-use) that we don't need to cache this operation?
|
|
||
| for ( const [ key, value ] of Object.entries( attributes ) ) { | ||
| if ( | ||
| isRichTextAttribute( blockName, key ) && |
There was a problem hiding this comment.
The serialize function relies on the value being an instant of RichTextData while the deserialize relies on the attribute for the value mentioning that its rich-text. Would a block author always mention that schema type, or could they skip it and then this path wouldn't be triggered? Should we account for the less than ideal path?
There was a problem hiding this comment.
Would a block author always mention that schema type
We have been assuming this contract will be upheld in general. We definitely should not go down the path of trying to predict the attribute type based on its value.
There was a problem hiding this comment.
Totally, we should follow the path of assuming that the contract will be upheld but I wanted to point it out 🙏🏾
There was a problem hiding this comment.
@ingeniumed That's a good point that there's asymmetry in the way we handle type conversion with instanceof vs attribute type, but I think this is the safest path. During serialization we want to ensure we're only using RichText methods to grab a string for a Y.Text if it's an actual RichText object, and during deserialization we have no other way of knowing. Everything is a string after serialization.
The Block Attributes docs page contains this:
The attribute definition will contain, at a minimum, either a
typeor anenum. There may be additional fields.
I think it's safe to follow convention to make sure our processing follows the defined types as closely as we can to Gutenberg's existing behavior, but we can see if this has any unexpected effects.
| for ( const [ key, value ] of Object.entries( attributes ) ) { | ||
| if ( | ||
| isRichTextAttribute( blockName, key ) && | ||
| typeof value === 'string' |
There was a problem hiding this comment.
The deserialize function, if I'm reading it right, doesn't account for arrays the way the serialize function does (tables for instance). Wouldn't the new rich-text conversion not apply if this kind of text existed in a table?
There was a problem hiding this comment.
You're right! In #76597 we added support for properly serializing those deeply nested rich-text values, but not here. I'll add a test and fix that.
|
Flaky tests detected in 7a0c78e. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/23269085270
|
…g block schema for deeply nested strings
|
@ingeniumed @chriszarate Thank you for your reviews above! I added a couple of changes:
|
| * @param value The HTML string to parse. | ||
| * @return The RichTextData instance. | ||
| */ | ||
| function cachedFromHTMLString( value: string ): RichTextData { |
There was a problem hiding this comment.
It'd be valuable to add a test for this as well. If it's possible, the cache being evicted would be handy as well.
There was a problem hiding this comment.
Good thinking! Added tests for the cache eviction behavior in 7a0c78e, as well as moving the related functions to their own file.
* Add failing unit test for RichTextData conversion * Fix RichText attribute type via deserializeBlockAttributes() call in getPostChangesFromCRDTDoc() * Add test for nested RichText value * Modify deserializeAttributeValue() to recurse arrays and object, using block schema for deeply nested strings * Remove deserializeBlockAttributeValues to avoid 3 different deserialization functions * Add factory wrapper for rich text cache, testing Co-authored-by: alecgeatches <alecgeatches@git.wordpress.org> Co-authored-by: chriszarate <czarate@git.wordpress.org> Co-authored-by: ingeniumed <ingeniumed@git.wordpress.org>
|
I just cherry-picked this PR to the wp/7.0 branch to get it included in the next release: 2af539b |
What?
Fixes an issue we've observed with the
jetpack-mu-wpcomcode block override that causes code content to be cleared out when another user loads the editor:code-block-reproduction.mov
This in particular resulted from the custom code block accessing a
RichTextattribute value viaattributes.content?.text. Intrunk, our CRDT code caused theattributes.contentvalue to be astringinstead ofRichTextDataso the.textproperty failed to exist.Why?
The Jetpack enhanced code block (and any block whose edit component accesses
content.textand expects a rich text object) breaks when a second user joins a collaborative session. The CRDT serialization path correctly convertsRichTextDatato strings when writing to the Y.Doc (serializeAttributeValue), but the deserialization path has no corresponding conversion back.This meant blocks received from CRDT had
contentas a raw string, socontent.text(aRichTextDatagetter) wasundefined. The edit component then initialized with empty content and wrote that back, wiping the block's content entirely.How?
Added
deserializeBlockAttributesas the inverse of the existing serialization path. WhengetPostChangesFromCRDTDocextracts blocks from the CRDT document, it now converts any rich-text string attributes back toRichTextDataviaRichTextData.fromHTMLString(). This ensures block edit components receive the same attribute types regardless of whether blocks were created locally or arrived via CRDT sync.Testing Instructions
Unit test
You can see the test failing before the fix here: https://github.com/WordPress/gutenberg/actions/runs/23220557291/job/67491959921?pr=76607.
Manual testing
Manual testing with
jetpack-mu-wpcom-specific blocks is difficult locally, so I've included test files that can be used locally to reproduce the same issue. Here's what the reproduction looks like using the "RTC Test Code Block":local-block-reproduction.mov
Test code block with fix disabled to show local reproduction
Here's how to run this locally to reproduce the bug against
trunk:Enable RTC via Settings -> Writing -> "Enable real-time collaboration" checkbox.
Create a
rtc-test-code-block/folder at the root of yourgutenbergrepository. Add these two files:rtc-test-code-block.phprtc-test-code-block.jsThese implement a block that reads from content via
attributes.content?.textand writes it back to the block attributes, replicating the same bug.Add the "plugin" to your
.wp-env.json:{ "$schema": "./schemas/json/wp-env.json", "testsEnvironment": false, "core": "WordPress/WordPress", - "plugins": [ "." ], + "plugins": [ ".", "./rtc-test-code-block" ], /* ... */ }Activate the plugin and open a draft post. Insert a "RTC Test Code Block", a heading block, and another "RTC Test Code Block", each with content.
Open the same post in a second browser.
After the collaborator list appears, the second user should see the code block contents cleared out.
Try again on this branch (
fix/rich-text-deserialization) and the content should stay.Use of AI Tools
AI assistance: Yes
Tool(s): Claude Code
Used for: Diagnosing the CRDT serialization problem and implementing the fix.